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Abstract 


The coronavirus spike protein is a multifunctional molecular machine that 
mediates coronavirus entry into host cells. It first binds to a receptor on 
the host cell surface through its S1 subunit and then fuses viral and host 
membranes through its $2 subunit. Two domains in S1 from different coro- 
naviruses recognize a variety of host receptors, leading to viral attachment. 
The spike protein exists in two structurally distinct conformations, prefusion 
and postfusion. The transition from prefusion to postfusion conformation of 
the spike protein must be triggered, leading to membrane fusion. This article 
reviews current knowledge about the structures and functions of coronavirus 
spike proteins, illustrating how the two S1 domains recognize different re- 
ceptors and how the spike proteins are regulated to undergo conformational 
transitions. I further discuss the evolution of these two critical functions of 
coronavirus spike proteins, receptor recognition and membrane fusion, in 
the context of the corresponding functions from other viruses and host cells. 
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INTRODUCTION 


Coronaviruses pose serious health threats to humans and other animals. From 2002 to 2003, severe 
acute respiratory syndrome coronavirus (SARS-CoV) infected 8,000 people, with a fatality rate 
of ~10% (1-4). Since 2012, Middle East respiratory syndrome coronavirus (MERS-CoV) has 
infected more than 1,700 people, with a fatality rate of ~36% (5, 6). Since 2013, porcine epidemic 
diarrhea coronavirus (PEDV) has swept throughout the United States, causing an almost 100% 
fatality rate in piglets and wiping out more than 10% of America’s pig population in less than a 
year (7-9). In general, coronaviruses cause widespread respiratory, gastrointestinal, and central 
nervous system diseases in humans and other animals, threatening human health and causing 
economic loss (10, 11). Coronaviruses are capable of adapting to new environments through 
mutation and recombination with relative ease and hence are programmed to alter host range 
and tissue tropism efficiently (12-14). Therefore, health threats from coronaviruses are constant 
and long-term. Understanding the virology of coronaviruses and controlling their spread have 
important implications for global health and economic stability. 

Coronaviruses belong to the family Coronaviridae in the order Nidovirales (10, 11). They can 
be classified into four genera: Al/phacoronavirus, Betacoronavirus, Gammacoronavirus, and Deltacoro- 
navirus (Figure 1a). Among them, alpha- and betacoronaviruses infect mammals, gammacoro- 
naviruses infect avian species, and deltacoronaviruses infect both mammalian and avian species. 
Representative alphacoronaviruses include human coronavirus NL63 (HCoV-NL63), porcine 
transmissible gastroenteritis coronavirus (TGEV), PEDV, and porcine respiratory coronavirus 
(PRCV). Representative betacoronaviruses include SARS-CoV, MERS-CoV, bat coronavirus 
HKU4, mouse hepatitis coronavirus (MHV), bovine coronavirus (BCoV), and human coron- 
avirus OC43. Representative gamma- and deltacoronaviruses include avian infectious bronchitis 
coronavirus (IBV) and porcine deltacoronavirus (PdCV), respectively. Coronaviruses are large, 
enveloped, positive-stranded RNA viruses. They have the largest genome among all RNA viruses, 
typically ranging from 27 to 32 kb. The genome is packed inside a helical capsid formed by the nu- 
cleocapsid protein (N) and further surrounded by an envelope. Associated with the viral envelope 
are at least three structural proteins: The membrane protein (M) and the envelope protein (E) 
are involved in virus assembly, whereas the spike protein (S) mediates virus entry into host cells. 
Some coronaviruses also encode an envelope-associated hemagglutinin-esterase protein (HE). 
Among these structural proteins, the spike forms large protrusions from the virus surface, giving 
coronaviruses the appearance of having crowns (hence their name; corona in Latin means crown) 
(Figures 1b and 2a). In addition to mediating virus entry, the spike is a critical determinant of 
viral host range and tissue tropism and a major inducer of host immune responses. 

The coronavirus spike contains three segments: a large ectodomain, a single-pass trans- 
membrane anchor, and a short intracellular tail (Figure 1b,c). The ectodomain consists of a 
receptor-binding subunit $1 and a membrane-fusion subunit $2. Electron microscopy studies 
revealed that the spike is a clove-shaped trimer with three S1 heads and a trimeric S2 stalk (15-18) 
(Figures 1b and 2a). During virus entry, $1 binds to a receptor on the host cell surface for viral 
attachment, and S2 fuses the host and viral membranes, allowing viral genomes to enter host 
cells. Receptor binding and membrane fusion are the initial and critical steps in the coronavirus 
infection cycle; they also serve as primary targets for human inventions. In this article, I review 
the structure and function of coronavirus spikes and discuss their evolution. 


RECEPTOR RECOGNITION BY CORONAVIRUS SPIKE PROTEINS 


Coronaviruses demonstrate a complex pattern for receptor recognition (19) (Figure 1d). For ex- 
ample, the alphacoronavirus HCoV-NL63 and the betacoronavirus SARS-CoV both recognize a 
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Figure 1 


Introduction to coronaviruses and their spike proteins. (a) Classification of coronaviruses. Representative coronaviruses in each genus 
are human coronavirus NL63 (H1CoV-NL63), porcine transmissible gastroenteritis coronavirus (TGEV), porcine epidemic diarrhea 
coronavirus (PEDV), and porcine respiratory coronavirus (PRCV) in the genus A/phacoronavirus; severe acute respiratory 

syndrome coronavirus (SARS-CoV), Middle East respiratory syndrome coronavirus (MERS-CoV), bat coronavirus HKU4, mouse 
hepatitis coronavirus (MHV), bovine coronavirus (BCoV), and human coronavirus OC43 in the genus Betacoronavirus; avian infectious 
bronchitis coronavirus (IBV) in the genus Gammacoronavirus; and porcine deltacoronavirus (PdCV) in the genus De/tacoronavirus. 

(b) Schematic of the overall structure of prefusion coronavirus spikes. Shown are the receptor-binding subunit $1, the membrane-fusion 
subunit S2, the transmembrane anchor (TM), the intracellular tail (IC), and the viral envelope. (c) Schematic of the domain structure of 
coronavirus spikes, including the $1 N-terminal domain (S1-NTD), the $1 C-terminal domain (S1-CTD), the fusion peptide (FP), and 
heptad repeat regions N and C (HR-N and HR-C). Scissors indicate two proteolysis sites in coronavirus spikes. (d) Summary of the 
structures and functions of coronavirus spikes. Host receptors recognized by either of the S1 domains are angiotensin-converting 
enzyme 2 (ACE2), aminopeptidase N (APN), dipeptidyl peptidase 4 (DPP4), carcinoembryonic antigen-related cell adhesion molecule 
1 (CEACAM1), and sugar. The available crystal structures of $1 domains and S2 HRs are shown. Their PDB IDs are 3KBH for 
HCoV-NL63 S1-CTD, 4F5C for PRCV S1-CTD, 2AJF for SARS-CoV S1-CTD, 4KRO for MERS-CoV S1-CTD, 3R4D for MHV, 
S1-NTD, 4H14 for BCoV S1-NTD, 2IEQ for HCoV-NL63 HRs, 1WYY for SARS-CoV HRs, 4NJL for MERS-CoV HRs, and 
1WDF for MHV HRs. 
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Figure 2 


Cryo-electron microscopy structures of prefusion trimeric coronavirus spikes. (@) Trimeric mouse hepatitis 
coronavirus (MHV) spike (PDB ID: 3JCL) (16). Three monomers are shown (magenta, cyan, and green). 

(b) One monomer from the trimeric MHV spike. The important functional elements of the spike [the S1 
N-terminal domain (S1-NTD), the $1 C-terminal domain (S1-CTD), the fusion peptide (FP), and the 
heptad repeat (HR1)] are colored in the same way as in Figure 1c. The dotted curve indicates a disordered 
loop. Scissors indicate two critical proteolysis sites. 


zinc peptidase angiotensin-converting enzyme 2 (ACE2) (20, 21). Moreover, HCoV-NL63 and 
other alphacoronaviruses recognize different receptors: other alphacoronaviruses such as TGEV, 
PEDV, and PRCV recognize another zinc peptidase, aminopeptidase N (APN) (22-25). Sim- 
ilarly, SARS-CoV and other betacoronaviruses recognize different receptors: MERS-CoV and 
HKU4 recognize a serine peptidase, dipeptidyl peptidase 4 (DPP4) (26, 27); MHV recognizes a 
cell adhesion molecule, carcinoembryonic antigen-related cell adhesion molecule 1 (CEACAM1) 
(28, 29); BCoV and OC43 recognize sugar (30). The alphacoronaviruses TGEV and PEDV and 
the gammacoronavirus IBV also use sugar as receptors or coreceptors (23, 31-34). Other than 
their role in viral attachment, these coronavirus receptors have their own physiological func- 
tions (35-41). The diversity of receptor usage is an outstanding feature of coronaviruses. To 
further compound the complexity of the issue, the S1 subunits from different genera share little 
sequence similarity, whereas those from the same genus have significant sequence similarity (42). 
Therefore, the following questions have been raised regarding receptor recognition by coron- 
aviruses: (4) How do coronaviruses from different genera recognize the same receptor protein? 
(b) How do coronaviruses from the same genus recognize different receptor proteins? (c) What 
is the molecular basis for coronavirus spikes to recognize sugar receptors and function as viral 
lectins? 

Two major domains in coronavirus $1, N-terminal domain (S1-NTD) and C-terminal do- 
main (S1-CTD), have been identified (Figure 1c,d). One or both of these $1 domains potentially 
bind receptors and function as the receptor-binding domain (RBD). SI-NTDs are responsible 
for binding sugar (23, 34, 43, 44), with the only known exception being betacoronavirus MHV 
SI-NTD that recognizes a protein receptor CEACAM1 (45). S1-CTDs are responsible for rec- 
ognizing protein receptors ACE2, APN, and DPP4 (23, 46-51). Crystal structures have been 
determined for a number of S1 domains complexed with their respective receptor (Figure 1d). 
These structures, along with functional studies, have addressed many of the puzzles surrounding 
receptor recognition by coronaviruses. 
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RECEPTOR RECOGNITION BY CORONAVIRUS S1-CTDS 


The structure of betacoronavirus SARS-CoV S1-CTD complexed with human ACE2 provided 
the first atomic view of coronavirus S1 (52, 53) (Figure 3a). SARS-CoV S1-CTD contains 
two subdomains: a core structure and a receptor-binding motif (RBM). The core structure is a 
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Figure 3 


Crystal structures of betacoronavirus $1 C-terminal domains (S1-CTDs). (a) Structure of severe acute respiratory syndrome 
coronavirus (SARS-CoV) S1-CTD complexed with human ACE2 (PDB ID: 2AJF) (52). Shown are the core structure of $1-CTD 
(cyan), the receptor-binding motif (red), and ACE2 (green). (b) Interface between human SARS-CoV S1-CTD and human ACE2, 
showing two virus-binding hot spots on human ACE2. Dashed lines indicate salt bridges. (c) Interface between palm civet SARS-CoV 
S1-CTD and human ACE2. Critical residue changes from human to civet SARS-CoV strains are labeled. (2) Interface between human: 
SARS-CoV S1-CTD and rat or mouse ACE2. Critical residue changes from human to rat or mouse ACE2 are labeled. (e) Structure of 
Middle East respiratory syndrome coronavirus (MERS-CoV) $1-CTD complexed with human DPP4 (PDB ID: 4KRO) (69). 
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five-stranded antiparallel 6-sheet. The RBM presents a gently concave outer surface to bind ACE2. 
The base of this concave surface is a short, two-stranded antiparallel B-sheet, and two ridges are 
formed by loops. The ectodomain of ACE2 contains a membrane-distal peptidase domain and 
a membrane-proximal collectrin domain (54). Several virus-binding motifs (VBMs) have been 
identified on the outer surface of the peptidase domain, away from the buried peptidase catalytic 
site (52). SARS-CoV binding does not interfere with the enzymatic activity of ACE2, nor does 
the enzymatic activity of ACE2 play any role in SARS-CoV entry (55). 

Research on SARS-CoV-ACE2 interactions has provided novel insight into cross-species 
transmissions of SARS-CoV. During the SARS epidemic, highly similar SARS-CoV strains were 
isolated from both human patients and palm civets from nearby animal markets (56). Their 
S1-CTDs differ by only two residues in the RBM region: Asn479 and Thr487 in human viral 
strains become Lys479 and Ser487 in civet viral strains, respectively (Figure 35,c). However, 
human SARS-CoV S1-CTD binds to human ACE2 much more tightly than civet SARS-CoV 
S1-CTD does. Two virus-binding hot spots have been identified on human ACE2, centering 
on ACE? residues Lys31 and Lys353, respectively (57-59) (Figure 3b). Both hot spots consist 
of a salt bridge buried in a hydrophobic environment and contribute critically to virus—receptor 
binding. Residues 479 and 487 in SARS-CoV S1-CTD interact closely with these hot spots 
and are under selective pressure to mutate. Two naturally selected viral mutations, K479N and 
S487T, strengthened the hot spot structures and enhanced the binding affinity of S$1-CTD for 
human ACE2 (55, 57-59) (Figure 3c). Consequently, these two mutations played important 
roles in the civet-to-human and human-to-human transmissions of SARS-CoV during the SARS 
epidemic (13, 55, 57-61). Compared to human ACE2, rat ACE2 contains two different residues 
that disfavor SARS-CoV binding: His353 disturbs the hot spot structure centering on Lys353, 
whereas Asn82 introduces an N-linked glycan, presenting steric interference with SARS-CoV 
binding (52) (Figure 3d). Mouse ACE2 also contains His353 but does not have the N-linked 
glycan at the 82 position. Thus, rat ACE2 is not a receptor for SARS-CoV, whereas mouse ACE2 
is a poor receptor. Consequently, SARS-CoV does not infect rat cells, and it infects mouse cells 
inefficiently (62, 63). SARS-like coronaviruses (SLCoVs) have been identified in bats, and some 
can infect human cells (64-68). Structural details on how these bat SLCoV S1-CTDs interact 
with ACE2 from different mammalian species still wait to be determined. Overall, these studies 
on SARS-CoV-ACE2 interactions reveal that (a) one or a few mutations in viral RBDs can cause 
serious epidemic outcomes and (4) one or a few residue variations in receptor homologs from 
different animal species can form critical barriers for cross-species transmissions of viruses. 

The structure of betacoronavirus MERS-CoV S1-CTD complexed with human DPP4, when 
compared with SARS-CoV, presented an interesting example of how two structurally similar 
viral RBDs recognize different protein receptors (Figure 34,e). Like SARS-CoV S1-CTD, 
MERS-CoV S1-CTD also contains two subdomains, a core structure and an RBM (69-71). The 
core structures of MERS-CoV and SARS-CoV S1-CTDs are similar to each other, whereas their 
RBMs are markedly different. In contrast to the loop-dominated and gently concave surface of 
SARS-CoV RBM, MERS-CoV RBM consists of a four-stranded antiparallel B-sheet, presenting 
a relatively flat surface to bind DPP4. On the other hand, DPP4 forms a homodimer and each 
monomer contains a hydrolase domain and a B-propeller domain (72). The VBMs are located 
on the outer surface of the 6-propeller domain, away from the peptidase catalytic site. The 
variations of VBM residues on DPP4 homologs from different mammalian species pose a barrier 
for cross-species transmissions of MERS-CoV. For example, mouse and rat DPP4 molecules 
are both poor receptors for MERS-CoV because they each contain a number of VBM residues 
that disfavor MERS-CoV binding (73-75). Camel DPP4 is an effective receptor for MERS-CoV 
due to its conserved VBM residues (76). Indeed, MERS-CoV has been isolated from camels, 
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suggesting a camel-to-human transmission of MERS-CoV (77, 78). Several MERS-related 
coronaviruses have been isolated from bats (79-81). Among them, HKU4 recognizes DPP4 
using a structural mechanism similar to that used by MERS-CoV, indicating a bat origin of 
MERS-CoV (27, 82). Overall, these studies reveal that viral RBDs with a conserved core structure 
can recognize different receptors through structural variations in their RBM, and they also 
reinforce the concept that receptor recognition is a critical determinant of viral host ranges. 

The structure of alphacoronavirus HCoV-NL63 S1-CTD complexed with human ACE2, 
when compared with SARS-CoV, showed how two structurally divergent viral RBDs recognize the 
same protein receptor (83) (Figures 3a and 4a). HCoV-NL63 S1-CTD contains a core structure 
and three RBM loops. The core structure of HCoV-NL63 S1-CTD is a 8-sandwich consisting of 
two three-stranded antiparallel 8-sheets. It differs from the core structure of SARS-CoV S1-CTD, 
which consists of a single-layer, five-stranded B-sheet. The RBMs of HCoV-NL63 S1-CTD are 
three short, discontinuous loops. They differ from the RBM of SARS-CoV S1-CTD, which 
is a long, continuous subdomain. Nevertheless, HCoV-NL63 and SARS-CoV S1-CTDs share 
the same structural topology (connectivity of secondary structural elements) (42) (Figure 4c,d). 
Despite their different structures, HCoV-NL63 and SARS-CoV S1-CTDs bind to the same 
VBMs on human ACE2 (Figure 4e,f). Between the two SARS-CoV-binding hot spots on human 
ACE2, the hot spot centering on Lys353 also plays a critical role in HCoV-NL63 binding (59). 
Consequently, as with SARS-CoV, entry of HCoV-NL63 into mouse cells is inefficient due to 
the presence of His353 on mouse ACE2 (48, 83). These studies demonstrate that viral RBDs with 
different structures can bind to a common virus-binding hot spot on the same protein receptor. 

The structure of alphacoronavirus PRCV S1-CTD complexed with porcine APN, when com- 
pared with HCoV-NL63, presented another example of how two similar coronavirus RBDs bind 
to different protein receptors (84) (Figure 4a,b). Like HCoV-NL63 S1-CTD, PRCV S1-CTD 
contains a $-sandwich core structure and three RBM loops. The core structures of PRCV and 
HCoV-NL63 S1-CTDs are similar to each other, but their RBMs are divergent, leading to dif- 
ferent receptor specificities. The ectodomain of APN has a seahorse-shaped structure and forms 
a head-to-head dimer (85, 86). The VBMs are located on the outer surface of APN, away from 
the buried APN catalytic site. Several other alphacoronaviruses, such as TGEV, PEDV, human 
coronavirus 229E, feline coronavirus, and canine coronavirus, recognize APN from their natural 
host as their receptor (22-25, 87). These APN-recognizing alphacoronaviruses, except for PEDV, 
have been shown to also recognize feline APN, suggesting transmission of feline coronavirus from 
cats to other mammals (87). These studies showed again that similar viral RBDs with a conserved 
core structure can recognize different protein receptors through structurally divergent RBMs. 

The above studies provide insight into the evolution of coronavirus $1-CTDs (19). Although 
the core structures of alpha and betacoronavirus S1-CTDs are a §-sandwich and a single-layer 
B-sheet, respectively, they share the same structural topology, suggesting a common evolutionary 
origin. The S1-CTDs from different genera likely have undergone extensive divergent evolu- 
tion to attain structurally different core structures. The three RBM loops of alphacoronavirus 
S1-CTDs might have further diverged into ACE2-binding RBMs in HCoV-NL63 and APN- 
binding RBMs in PRCV. The RBM subdomain of betacoronavirus $1-CTDs might also have 
diverged into ACE2-binding RBM in SARS-CoV and DPP4-binding RBM in MERS-CoV. De- 
spite their different structures, alphacoronavirus HCoV-NL63 and betacoronavirus SARS-CoV 
S1-CTDs bind to a common region on ACE2, possibly driven by the common virus-binding hot 
spot on ACE2. The tertiary structures of the S1-CTDs from gamma- and deltacoronaviruses 
are unavailable but are likely related to the folds of alpha- and betacoronavirus $1-CTDs. The 
complex evolutionary relationships among the $1-CTDs from different genera reflect the heavy 
evolutionary pressure on this domain, which is discussed in more detail in this review. 


www.annualreviews.org ¢ Coronavirus Receptor Recognition and Cell Entry 


Changes may still occur before final publication online and in print 


Pet: 


Annu. Rev. Virol. 2016.3. Downloaded from www.annualreviews.org 
Access provided by Cornell University - Weill Medical College on 09/02/16. For personal use only. 


‘VI03CH27-Li ARI 17 August 2016 18:20 


RECEPTOR RECOGNITION BY CORONAVIRUS S1-NTDS 


The structure of betacoronavirus MHV S1-NTD complexed with murine CEACAM1 provided 
the first atomic view of coronavirus S1-NTDs (88) (Figure 5a). MHV S1-NTD consists of a B- 
sandwich core structure and a ceiling-like structure on top of it. The core structure contains two 
antiparallel 6-sheets, one with six B-strands and the other with seven. MHV S1-NTD has the same 
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Crystal structures of betacoronavirus $1 N-terminal domains (S1-NTDs). (a) Structure of mouse hepatitis coronavirus (MHV) 
S1-NTD complexed with murine CEACAM1 (PDB ID: 3R4D) (88). The core structure of MHV S1-NTD is shown in magenta and 
green, the receptor-binding motifs in red, and the rest in cyan. The N-terminal immunoglobulin domain of CEACAM1 is shown in 
yellow and virus-binding motifs in blue. () Structure of bovine coronavirus (BCoV) SI-NTD (PDB ID: 4H14) (43). The asterisk 
indicates the binding site for sugar receptor Neu5,9Ac2. (c) Structure of human galectin-3 complexed with galactose (PDB ID: 1A3K). 
Sugar receptor is shown in blue. (d) Structure of influenza virus HA1 (PDB ID: 1JSO). (e-g) Structural topologies of (e) 
betacoronavirus S1-NTDs, (f) human galectins, and (g) influenza virus HA1. 


Figure 5 


structural fold as human galectins (galactose-binding lectins) (Figure 5a,c,e,f). The RBMs are 
located on the outer surface of the ceiling-like structure. On the other hand, CEACAMI contains 
either two or four immunoglobulin (Ig) domains (89, 90). The VBMs are located on the membrane- 
distal surface of the N-terminal Ig domain of CEACAM1. Despite its galectin fold, MHV S1-NTD 


< 


Figure 4 


Crystal structures of alphacoronavirus $1 C-terminal domains (S1-CTDs). (a) Structure of human coronavirus NL63 (HCoV-NL63) 
S1-CTD complexed with human ACE2 (PDB ID: 4KBH) (83). (4) Structure of porcine respiratory coronavirus (PRCV) S1-CTD 
complexed with porcine APN (PDB ID: 4F5C) (84). (c) Structural topology of alphacoronavirus $1-CTDs. 8-Strands are shown as 
arrows. (d) Structural topology of betacoronavirus S1-CTDs. «-Helices are shown as cylinders. All of the secondary structural 
elements in panels c and d are connected in the same order, even though strands 84, 81, and 2 in panel c become helices a4, x1, anda 
loop, respectively, in panel d. (e) Interface between HCoV-NL63 S1-CTD and human ACE2. Virus-binding motifs.(VBMs),on AGE2 
are shown in blue. Receptor-binding motifs (RBMs) on S1-CTD are shown in red. (f) Interface between SARS-CoV S1-CTD and 
human ACE2. 
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binds to CEACAM1 through exclusive protein-protein interactions. Structural and mutagenesis 
studies have identified two critical hydrophobic patches at the $1-NTD/CEACAM1 interface 
(88, 90-93). Critical RBM residues are conserved in the S1-NTDs from different MHV strains, 
including hepatotropic strain A59 and neurotropic strain JHM, allowing these MHV strains to 
use CEACAM1 as their receptor (94). Several critical VBM residues differ between two forms 
of CEACAMI encoded by mice, CEACAM 1a and CEACAM 1b, rendering CEACM 1a, but not 
CEACAMIb, as an effective receptor for MHV (95-97). CEACAM1a molecules from mouse, 
cattle, and human also differ in several critical VBM residues, presenting a barrier for cross-species 
transmissions of MHV (88). These studies reveal a surprising galectin fold of MHV S1-NTD and 
provide insight into the host range and tissue tropism of MHV. 

The structure of betacoronavirus BCoV S1-NTD illustrated a functional coronavirus spike 
lectin domain (43) (Figure 5b). BCoV S1-NTD has a galectin fold similar to that of MHV 
S1-NTD. However, unlike MHV S1-NTD, which recognizes a protein receptor, BCoV S1- 
NTD recognizes a sugar receptor. Mutagenesis studies have identified the sugar-binding pocket 
in the cavity surrounded by the core structure and the ceiling-like structure on top of it. The 
sugar-binding site in BCoV S1-NTD overlaps with that in human galectins, although galectins 
do not have the ceiling-like structure and consequently their sugar-binding site is open (98). 
BCoV S1-NTD does not recognize galactose as galectins do. Instead, it recognizes 5-N-acetyl- 
9-O-acetylneuraminic acid (Neu5,9Ac2) (30, 43). The same sugar receptor is also recognized by 
human coronavirus OC43 (43, 99). OC43 and BCoV are closely related genetically, and OC43 
might have resulted from zoonotic spillover of BCoV (100, 101). Because Neu5,9Ac2 is widely 
expressed in mammalian tissues (102), recognition of this sugar receptor might have played a role 
in the cattle-to-human transmission of BCoV. In contrast, despite the structural similarity between 
MHV and BCoV S1-NTDs, MHV S1-NTD binds CEACAM1 but not sugar (43, 88). MHV S1- 
NTD does not bind sugar, because a critical sugar-binding loop in BCoV S1-NTD has a different 
conformation in MHV S1-NTD. Similarly, BCoV S1-NTD does not bind CEACAM1, because 
the CEACAM-binding RBMs in MHV S1-NTD have undergone conformational changes in 
BCoV S1-NTD. Therefore, despite their common galectin fold, betacoronavirus S1-NTDs can 
recognize either a protein receptor or a sugar receptor depending on the conformation of their 
RBMs. 

The structures of S1-NTDs from alpha-, gamma-, and deltacoronaviruses are unavailable, 
but on the basis of the following observations, they probably all have a galectin fold. First, the 
related structural topology between alpha- and betacoronavirus S1-CTDs suggests that the S1 
subunits across different genera have a common evolutionary origin (42). Second, the S1-NTDs 
from the alphacoronaviruses ,TGEV and PEDV and the gammacoronavirus IBV all recognize 
sugar receptors, although TGEV S1-NTD recognizes N-glycolylneuraminic acid (Neu5Gc) and 
N-acetylneuraminic acid (Neu5Ac), PEDV S1-NTD recognizes NeuSAc, and IBV S1-NTD rec- 
ognizes Neu5Gc (23, 31, 34, 44). Therefore, the S1-NTDs from different genera are likely all 
evolutionarily and structurally related. 

Based on the above studies, the following evolutionary scenario has been proposed for coron- 
avirus S1-NTDs (19). Through gene capture, ancestral coronaviruses might have acquired a host 
galectin, which would become the $1-NTD of their spikes. Consequently, coronaviruses would 
recognize sugar receptors for cell entry. To aid viral release from infected cells, some coronaviruses 
would also evolve a hemagglutinin-esterase protein (HE) as a receptor-destroying enzyme. Later, 
coronavirus S1-NTDs would evolve a ceiling-like structure that could protect their sugar-binding 
site from host immune surveillance; this ceiling-like structure is absent in galectins because as 
host proteins, galectins are not recognized by the host immune system. The outer surface of the 
ceiling-like structure in MHV would further evolve structural features that could function as 
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RBMs and recognize CEACAM1. Because protein receptors in general provide higher affinity 
and specificity for viral attachment than sugar receptors do, the sugar-binding function of MHV 
S1-NTD would become dispensable and be lost. The S1-NTDs of some other contemporary 
coronaviruses, such as the alphacoronaviruses TGEV and PEDV, the betacoronaviruses BCoV 
and OC43, and the gammacoronavirus IBV, have retained the lectin function, but their sugar 
specificities have diverged. The galectin-like domain has also been found in a number of other 
viral spikes (103) (Figure 5d,g), including influenza virus HA, rotavirus VP4, and the adenovirus 
galectin domain. These viral galectin-like domains display diverse sugar-binding modes, but their 
sugar-binding sites are all located in cavities, possibly to evade host immune surveillance. Overall, 
it appears to be a successful strategy for viruses to acquire a host lectin and evolve it into viral RBDs 
with novel specificity for a protein receptor or altered specificity for a different sugar receptor. 


STRUCTURAL MECHANISM FOR MEMBRANE FUSION 
BY CORONAVIRUS SPIKE PROTEINS 


The coronavirus spike is believed to be a member of the class I viral membrane fusion proteins that 
also include those from influenza virus, human immunodeficiency virus (HIV), and Ebola virus 
(Figure 6a). Among these proteins, the hemagglutinin glycoprotein (HA) of the influenza virus is 
arguably the best studied (104, 105). HA is expressed as a single-chain precursor. During molecular 
maturation, it trimerizes and is cleaved by host proteases into receptor-binding subunit HA1 and 
membrane-fusion subunit HA2, which still associate together through noncovalent interactions 
(106, 107). This prefusion state of HA on the newly packaged virions is a proteolytically primed 
and metastable trimer. During cell entry, HA1 binds to a sugar receptor on the host cell surface 
for viral attachment, and then HAI dissociates and HA2 undergoes a dramatic conformational 
change to transition to the postfusion state. During this transition, three pairs of heptad repeat 
regions HR-N and HR-C in trimeric HA2 form a six-helix bundle structure. Three previously 
buried hydrophobic fusion peptides in trimeric HA2 become exposed and insert into the target 
host membrane. The fusion peptides and transmembrane anchors are eventually positioned on the 
same end of the six-helix bundle, bringing the viral and host membranes together to fuse. Because 
the six-helix bundle structure is energetically stable, a large amount of energy is released during the 
conformational transition of HA, driving membrane fusion forward. However, an initial energy 
barrier for the conformational transition of HA must be overcome through the aforementioned 
proteolytic priming and one or more triggers. These triggers can be receptor binding (e.g., HIV) 
(108), low pH (e.g., influenza virus) (109), or a combination of the two (e.g., avian leucosis virus) 
(110). Consequently, membrane fusion occurs either on the host cell surface (e.g., HIV) or in the 
endosomes (e.g., influenza virus). 

The prefusion structures of betacoronavirus MHV and HKU1 spikes were recently determined 
using cryo-electron microscopy (15, 16) (Figure 2a). The overall architecture of the prefusion 
coronavirus spikes is similar to, albeit significantly larger and more complex than, that of influenza 
virus HA. In each spike, three S1 heads sit on top of a trimeric S2 stalk, preventing S2 from 
undergoing conformational transitions. Between the two major S1 domains, S1-CTD is located 
at the very top of the spike, whereas S1-NTD directly contacts and structurally constrains S2. In 
the S2 stalk, HR-N forms several helices and arranges itself along the central threefold symmetry 
axis of trimeric $2, whereas HR-C is poorly ordered. Unlike the fusion peptide in influenza virus 
HA, which is located at the N terminus of HA2, the fusion peptide in coronavirus spikes is located 
downstream from the N terminus of S2 and is hence an internal fusion peptide. This fusion peptide 
forms a short helix and a loop, with most of the hydrophobic residues buried inside the prefusion 
structure. Two proteolysis sites are essential for the conformational transition of $2 (111, 112), 
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Structural mechanism for membrane fusion by coronavirus spikes. (a) Structural mechanism for membrane fusion by class I viral 
membrane fusion proteins. Schematics of these proteins in both prefusion and postfusion conformations are shown. (b) Negative-stain 
electron microscopy images of SARS-CoV spike in both prefusion and postfusion conformations are shown (18). (c) Schematics of 
SARS-CoV postfusion S2 in solution (/eft) and in vivo (right). Abbreviations: FP, fusion peptide; HR-C, heptad repeat region C; HR-N, 
heptad repeat region N; IC, intracellular tail; SARS-CoV, severe acute respiratory syndrome coronavirus; TM, transmembrane anchor. 


one at the S1/S2 boundary and the other at the N terminus of the internal fusion peptide. Like 
prefusion influenza virus HA, the prefusion coronavirus spike is in a metastable state and primed 
to undergo conformational transitions for membrane fusion. 

Atomic models of full-length postfusion coronavirus spikes are not available, but a negative- 
stain electron microscopy study on SARS-CoV spike provided a direct view of its conformational 
transition that was likely associated with membrane fusion during virus entry (18) (Figure 6). 
In vitro triggers (e.g., trypsin cleavage and urea incubation) induce the prefusion SARS-CoV 
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spike to transition to its postfusion state, in which S1 dissociates and S2 forms a dumbbell-shaped 
structure with a rod-like structure in the middle and a globular structure at each end. These 
trimeric S2 molecules further associate at one end to form rosettes in solution. Comparison with 
influenza virus HA suggests that the rod-like structure in the middle likely represents the six-helix 
bundle formed by HR-N and HR-C, whereas the globular structures at the two ends likely 
correspond to the regions N-terminal to HR-N and between HR-N and HR-C, respectively 
(Figures 1c and 6c). Three hydrophobic fusion peptides, located N-terminal to HR-N, were 
previously buried in the prefusion spike but become exposed in the postfusion S2 and associate to 
form rosettes in solution (they would insert into host membranes, were the membranes present). 
Crystal structures of the six-helix bundle have been determined for a number of coronaviruses 
such as MHV, SARS-CoV, MERS-CoV, and HCoV-NL63 (113-119) (Figures 1d and 6c). 
Compared with influenza virus HA, the six-helix bundle formed by coronavirus HR-N and HR-C 
is unusually long (18, 117), indicating the abundant amount of energy that can be released during 
the conformational transition of S2 and become available for use in membrane fusion. Overall, 
these studies reveal that coronavirus spikes fuse membranes using the same structural mechanism 
as other class | membrane fusion proteins, but not for lack of some unique features such as their 
large size, internal fusion peptide, double cleavage sites, and long six-helix bundle. 


TRIGGERS FOR MEMBRANE FUSION BY CORONAVIRUS 
SPIKE PROTEINS 


The triggers for coronavirus spikes to undergo conformational transitions demonstrate a more 
complex pattern than many other class I membrane fusion proteins, probably a reflection of their 
unique structural features discussed above. For example, although influenza virus HA is primed by 
proteolysis during virus packaging, many coronavirus spikes are not. Instead, all coronavirus spikes 
are subjected to proteolysis later in the cell entry process, sometimes after receptor binding. Thus, 
proteolysis of coronavirus spikes can lead directly to membrane fusion and thereby serves as an 
essential trigger for membrane fusion (111, 112). The host proteases that cleave coronavirus spikes 
mainly come from four different stages of the virus infection cycle: (2) proprotein convertases (e.g., 
furin) during virus packaging in virus-producing cells, (b) extracellular proteases (e.g., elastase) after 
virus release into extracellular space, (c) cell surface proteases [e.g., type I transmembrane serine 
protease (TMPRSS2)] after virus attachment to virus-targeting cells, and (d) lysosomal proteases 
(e.g., cathepsin L and cathepsin B) after virus endocytosis in virus-targeting cells (120) (Figure 7). 
In addition to proteolysis, traditional triggers such as receptor binding and low pH may also play 
a role in membrane fusion. Here, I discuss detailed triggers for membrane fusion by the spikes 
from three representative coronaviruses. 

As the prototypic coronavirus, MHV has been extensively examined for its cell entry mecha- 
nism. The findings are complicated and in some cases contradictory. First, MHV spike is cleaved 
by proprotein convertases during virus packaging in the virus-producing cells (121). This pro- 
teolysis is critical for MHV entry into virus-targeting cells. Second, the binding of CEACAM1 
triggers the conformational transition of MHV spike and hence membrane fusion. This is sup- 
ported by the observations that incubation of MHV spike with recombinant soluble CEACAM1 
led to enhanced hydrophobicity of MHV S2 and the appearance of a protease-resistant S2 frag- 
ment (122, 123). These observations indicated the exposure of fusion peptides and formation 
of the six-helix bundle, respectively, in postfusion $2. Third, there have been contradictory re- 
ports about whether MHV enters target cells at the cell membrane or through endocytosis and 
whether MHV spike undergoes conformational transitions at low, neutral, or even elevated pH 
(123-125). This may depend on the MHV strains examined or experimental approaches used in 
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Triggers for coronavirus spikes to fuse membranes. Scissors indicate potential spike-processing host proteases. Shown are virus 
particles (green spheres), virus surface spikes (blue protrusions), viral genome (magenta coils), cells (large gray circles), endosome/lysosome 
(small shaded gray circle), and receptor (ight green base on cell surface). Spike-processing host proteases are labeled for representative 
coronaviruses: Middle East respiratory syndrome coronavirus (MERS-CoV), mouse hepatitis coronavirus (MHV), and severe acute 
respiratory syndrome coronavirus (SARS-CoV). 


the studies. The roles of pH and endocytosis in MHV entry still need to be further clarified. 
Interestingly, the neurotropic strain MHV-JHM can mediate virus entry into host cells that do 
not express CEACAM1 (126-128). This receptor-independent entry by MHV-JHM is unique 
among viruses. Biochemical characterization of MHV-JHM spike suggested that it is more labile 
than the spikes from other MHV strains, meaning that it undergoes conformational transitions 
spontaneously in the absence of the receptor (94, 129, 130). It is believed that MHV-JHM is able 
to infect neural cells where CEACAM1 expression level is very low, at least in part because its 
spike can mediate receptor-independent entry (131, 132). Taken together, the membrane fusion 
mechanism of MHV spike depends on both proteolysis and receptor binding, and it may or may 
not depend on the low pH of endosomes; in addition, receptor-independent membrane fusion by 
MHV-JHM spike contributes to the neutral tropism of MHV-JHM. 
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Research on the cell entry mechanism of SARS-CoV has led to novel findings. First, SARS-CoV 
spike is not cleaved by proprotein convertases during virus packaging and hence remains intact 
on mature virions (133, 134). Instead, SARS-CoV enters host cells through endocytosis, and its 
spike is processed by lysosomal proteases (e.g., cathepsin L and cathepsin B) (135-137). This is 
supported by the observation that inhibitors against either endosomal acidification or lysosomal 
cysteine proteases block SARS-CoV entry. However, low pH itself is not a trigger for SARS-CoV 
entry. This is supported by the observation that when expressed on the cell surface and cleaved 
by exogenous proteases, SARS-CoV spike can mediate cell-cell fusion with ACE2-expressing 
cells at neutral pH (136). Thus, the role of low pH in SARS-CoV entry is to activate lysosomal 
proteases, which further activate SARS-CoV spike for membrane fusion. This is different from 
influenza virus HA, which is activated through binding protons in the low-pH environment of 
endosomes. Second, in addition to lysosomal proteases, both extracellular proteases (e.g., elastases 
in the respiratory tract) and cell surface proteases (e.g., TMPRSS2 on the surface of lung cells) also 
activate SARS-CoV spike for membrane fusion (138-142). Due to their cell and tissue specificities, 
these proteases likely contribute to the respiratory tract and lung tropism of SARS-CoV. Third, 
in addition to the cleavage site at the S1/S2 boundary, a second site, $2’, has been identified at the 
N terminus of the internal fusion peptide within S2 (143, 144). Whereas the cleavage at the S1/S2 
boundary removes the structural constraint of S1 on S82, the cleavage at the S2’ site releases the 
internal fusion peptide for insertion into target membranes (Figures Ic and 2b). Fourth, it is not 
clear whether the binding of receptor ACE2 is a trigger for SARS-CoV spike to fuse membranes. 
Two electron microscopy studies observed no or moderate conformational changes of SARS- 
CoV spike associated with ACE2 binding (18, 145). However, other studies suggested that ACE2 
binding triggers a conformational change in SARS-CoV spike, which exposes previously cryptic 
protease sites for cleavage (135, 146). Although the role of ACE2 binding in triggering membrane 
fusion waits to be further investigated, SARS-CoV entry does not depend on low pH, butit requires 
at least two protease cleavages in the spike by lysosomal proteases, extracellular proteases, or cell 
surface proteases. 

The overall cell entry mechanism of MERS-CoV is similar to that of SARS-CoV. Like SARS- 
CoV spike, MERS-CoV spike must be cleaved at both the S1/S2 boundary and the S2’ site for 
membrane fusion to occur (147). MERS-CoV also enters host cells through endocytosis and is 
activated by lysosomal cysteine proteases for membrane fusion (27, 148). Moreover, extracellular 
proteases and cell surface proteases help activate MERS-CoV entry (148-150). Unlike SARS- 
CoV, MERS-CoV spike is cleaved by host proprotein convertases during virus packaging (27, 
151). Interestingly, despite recognizing the same receptor DPP4, MERS-CoV and HKU4 spikes 
differ in their activities to mediate virus entry: HKU4 spike mediates virus entry into bat cells but 
not human cells, whereas MERS-CoV spike mediates virus entry into both bat and human cells 
(152). Two residue differences have been identified between MERS-CoV and HKU4 spikes that 
account for this functional difference (152); they allow MERS-CoV spike, but not HKU4 spike, 
to be activated by human proprotein convertases and lysosomal cysteine proteases. Thus, the 
corresponding two mutations played a critical role in the transmission of MERS-CoV from bats, 
its likely natural reservoir, to humans, either directly or through intermediate host camels. On the 
other hand, HKU4 spike can be activated by bat lysosomal proteases but not human lysosomal 
proteases, suggesting that human and bat lysosomal proteases process viral spikes differently. 
These studies on MERS-CoV entry reveal that different activities of spike-processing proteases 
from different hosts can pose a barrier for cross-species transmissions of viruses. 

In sum, proteolysis has been established as an essential trigger for coronavirus spikes to fuse 
membranes, as cleavages at the S1/S2 boundary and S2’ site can remove the structural constraint of 
S1 on 82 and release the internal fusion peptide, respectively. Among the host proteases, lysosomal 
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proteases provide the most reliable source for spike processing because they are ubiquitous and 
abundant in many cell types. The availability of some other proteases (e.g., proprotein convertases, 
extracellular proteases, and cell surface proteases) depends on the types of cells and tissues, regu- 
lating tissue tropisms of coronaviruses. Moreover, protease activities from different host species 
may vary, regulating host ranges of coronaviruses. Some other triggers for coronavirus entry (e.g., 
receptor binding and low pH) may depend on specific coronaviruses or different strains of the 
same coronavirus. The overall goal of these triggers is to overcome the energy barrier for the 
conformational transition of coronavirus spikes. 


EVOLUTION OF CORONAVIRUS SPIKE PROTEINS 


Coronavirus spikes, like other class I viral membrane fusion proteins, are amazing molecules. They 
single-handedly lead coronaviruses to enter host cells by first binding to a receptor on the host cell 
surface and then fusing the viral and host membranes. They exist in two distinct conformations: 
The prefusion trimeric spike contains three receptor-binding S1 heads and a trimeric membrane- 
fusion S2 stalk, whereas the postfusion trimeric S2 is a six-helix bundle with exposed fusion 
peptides. The transition of the spikes from prefusion to postfusion conformation is regulated by 
a variety of triggers. Both receptor recognition and membrane fusion are critical determinants 
of the host range and tissue tropism of coronaviruses. How have these complex structures and 
functions of coronavirus spikes evolved? 

Structure determinations of coronavirus S1 domains provide insight into the evolution of 
coronavirus S1. The finding that betacoronavirus S1-NTDs have a galectin fold indicates a host 
origin of coronavirus S1-NTDs. The origin of coronavirus S1-CTDs is less clear. Alphacoro- 
navirus S1-CTDs and host galectins also share some similarity in the structural topologies of 
their B-sandwich folds (Figure 82), although this similarity is less significant than that between 
S1-NTDs and host galectins (Figure 5e,f). 8-Sandwich folds are common and stable structures, 
and two B-sandwich folds may result from convergent evolution with protein stability as the evo- 
lutionary driving force. However, two f-sandwich folds with related structural topologies may 
indicate a common evolutionary ancestor when a significant number of their constituent B -strands 
are connected in the same order. Thus, there is a possibility that S1-CTD and host galectins are 
evolutionarily related. One possible scenario is that after SI-NTD was generated through gene 
capture, S1-CTD was generated through gene duplication of S1-NTD (Figure 8b). S1-CTDs 
appear to evolve at a quickened pace, as evidenced by the different tertiary structures between 
alpha- and betacoronavirus $1-CTDs. This may be associated with their location on the very top 
of the prefusion trimeric spike (Figure 2), which is the most protruding and exposed area on 
virions. Hence, S1-CTDs are under heavy selective pressure to escape host immune surveillance. 
The resulting fast-paced evolution of S1-CTDs may have permanently erased their evolution- 
ary traces, except for the limited information from their structural topology. Whether S1-CTDs 
originated from host galectins or not, the two-domain structure in S1 gives coronaviruses two po- 
tential receptor-binding domains, with the more structurally and functionally conserved S1-NTD 


<_ 


Figure 8 


Evolution of coronavirus spikes. (a) Structural comparison between human galectins and alphacoronavirus HCoV-NL63 S1-CTD. 
Both the crystal structures and structural topologies of the two proteins are shown. Common subcore structures in the\two proteins are 
highlighted with gray shading. (6) Hypothesized evolution of coronavirus spike proteins. Abbreviations: HCoV-NL63, human 
coronavirus NL63; IC, intracellular tail; S1-CTD, $1 C-terminal domain; S1-NTD, S1 N-terminal domain; TM, transmembrane 
anchor. 
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using sugar as the fallback receptor and the more aggressively evolving S1-CTD exploiting novel 
protein receptors (Figure 8). 

The structural and functional similarities between coronavirus S2 and other class I viral mem- 
brane fusion proteins are profound. These proteins all exist in prefusion and postfusion con- 
formations. Their prefusion structures can be triggered in a number of similar ways, undergo 
similar conformational rearrangement, and transition to highly similar postfusion six-helix bun- 
dle structures with exposed fusion peptides. Although it cannot be completely ruled out that the 
same membrane fusion mechanism evolved independently in these viruses, the complexity and 
intricacy of this mechanism indicate that class I viral membrane fusion proteins likely share a 
common evolutionary ancestor. 

Which function evolved first for coronaviruses: receptor recognition by $1, membrane fusion 
by S2, or both simultaneously? Because coronaviruses must enter cells for replication, membrane 
fusion is the central function of coronavirus spikes. Receptor recognition, though, can specifically 
attach coronaviruses to host cell surfaces and position the spikes within striking distance of target 
host membranes. The spike of neurotropic strain MHV-JHM can mediate receptor-independent 
virus entry into cells that do not express its receptor, suggesting that receptor binding can be 
circumvented under some extreme situations. Therefore, the primordial form of coronavirus spikes 
might have contained S2 only (Figure 8b). Such a primordial spike might function inefficiently 
because the ancestral virus would have to diffuse nonspecifically to enter into close proximity of 
target cells in order for membrane fusion to occur. Later, the spike would evolve a galectin-like 
S1-NTD through gene capture, which would enhance its efficiency in mediating virus entry. 
Next, the spike would evolve a S1-CTD through gene duplication of its S1-NTD or some other 
mechanism, which would further strengthen its receptor recognition function. Understanding the 
structure and function of coronavirus spikes and their evolution can enhance our understanding 
of the origin of viruses and the evolutionary relationship between viruses and host cells. 
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